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Abstract —With the increasing availability of various sensor technolo¬ 
gies, we now have access to large amounts of multi-block (also called 
multi-set, multi-relational, or multi-view) data that need to be jointly an¬ 
alyzed to explore their latent connections. Various component analysis 
methods have played an increasingly important role for the analysis 
of such coupled data. In this paper, we first provide a brief review of 
existing matrix-based (two-way) component analysis methods for the 
joint analysis of such data with a focus on biomedical applications. 
Then, we discuss their important extensions and generalization to 
multi-block multiway (tensor) data. We show how constrained multi¬ 
block tensor decomposition methods are able to extract similar or sta¬ 
tistically dependent common features that are shared by all blocks, by 
incorporating the multiway nature of data. Special emphasis is given to 
the flexible common and individual feature analysis of multi-block data 
with the aim to simultaneously extract common and individual latent 
components with desired properties and types of diversity. Illustrative 
examples are given to demonstrate their effectiveness for biomedical 
data analysis. 

Index Terms —(Multiway) blind source separation (BSS), (multilinear) 
independent component analysis, non negative/sparse matrix/tensor 
factorizations, group and joint independent component analysis, in¬ 
dependent vector analysis (IVA), CPD (PARAFAC) decompositions, 
constrained Tucker decompositions for multi-block data, data fusion, 
analysis of multi-relational data. 
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1 Introduction 

Development and wide deployment of various sensor 
technologies have resulted in the generation of large 
amounts of multi-block data that need to be jointly 
analyzed in order to extract physiologically meaning¬ 
ful and useful hidden (latent) components. Medical 
recording and diagnostic devices play an important 
role among those. In order to explore spatial, tempo¬ 
ral, and spectral differences and similarities in multi¬ 
array, multi-block bio-signals and images, researchers 
developed a variety of strategies based on component 
analysis and multiway factor analysis. In this paper, 
we focus on analysis of multi-block data that are 
naturally connected using tensor/matrix factorizations. 
We critically review emerging multiway component 
analysis or multilinear blind source separation ap¬ 
proaches that include procedures for extracting hidden 
components (or source signals) with specific features 
or constraints (e.g., sparsity, nonnegativity, statistical 
independence). The presented approaches can be ap¬ 
plied not only to unsupervised multilinear blind source 
separation but also for missing data imputation (tensor 
completion), feature extraction, classification, cluster¬ 
ing and anomaly detection, especially in application 
to electroencephalography (EEG), electrocorticography 
(ECoG), and magnetic resonance imaging (MRI) data. 
Multi-block tensor decompositions and multi-way anal¬ 
ysis allow us to discover meaningful hidden spatio- 
temporal structures of complex data and perform gen¬ 
eralizations by capturing multi-linear and multi-aspect 
relationships. The challenge is how to analyze complex 
inhomogeneous multi-set large-scale multi-array data. 
We describe the key procedure to apply tensor/matrix 
factorization/decompositions methods for these prob¬ 
lems in practice and demonstrate a number of success¬ 
ful examples. 

Tensors have been adopted in diverse branches of 
data analysis, such as in signal and image process¬ 
ing, psychometrics, chemometrics, biometrics, quan¬ 
tum physics, quantum chemistry and neuroscience Q- 
| [1Q) . Tensor decompositions provide extensions of blind 
source separation (BSS) and 2-way (matrix) component 
analysis (2-way CA) to multi-way component analysis 
(MWCA) and have proven to be suitable for 





2 


PROCEEDINGS OF THE IEEE 


dimensionality reduction of multi-way data, even if the 
data are incomplete and corrupted by noise. Tensor 
decompositions are particularly attractive for biomed¬ 
ical data analysis that exhibits not only large volume 
but also very high variety. Thus, they are suited to 
problems in bio- and neuro-informatics or computa¬ 
tional neuroscience where data are collected in various 
forms of large, sparse tabular, graphs or networks with 
multiple aspects and high dimensionality. For historical 
overview of tensor factorizations/decompositions and 
important scientific contributions in these areas in some 
chronological orders please check (2),@ and references 
therein. 

Moreover, multi-block tensors, which arise in numer¬ 
ous important applications that require the analysis of 
diverse and partially related data, can be decomposed 
to common (or correlated) and individual (or uncor¬ 
related) components. The effective analysis of coupled 
tensors requires the development of new models and 
associated algorithms that can identify the core rela¬ 
tions that may exist among the different tensors, and 
scale to large multi-sets data. 

This paper extends beyond the standard tensor de¬ 
composition models such as the unconstrained single 
block Tucker and canonical polyadic (CP) models, and 
aims to demonstrate flexibilities of matrix/tensor de¬ 
compositions for multi-block and multi-modal data, 
together with their role as a mathematical backbone for 
the discovery of hidden structures in large-scale multi- 
relational data 0. ©• 

1.1 Preliminaries - Basic Tensor Notations and Op¬ 
erations 

A higher-order tensor can be interpreted as a multiway 
array of numbers. Tensors are denoted by calligraphic 
capital letters, e.g., X G -x/Ar shall assume 

that all entries of a tensor are real-valued). The order of 
a tensor is the number of its "modes", "ways" or "di¬ 
mensions", which include e.g., space, time, frequency, 
trials, subjects, classes, and dictionaries 0, 0. A matrix 
has two modes: rows and columns, while an Ath-order 
tensor has N modes. 

Matrices (2nd-order tensors) are denoted by boldface 
capital letters, e.g., X, and vectors (Ist-order tensors) 
by boldface lowercase letters; for instance the columns 
of the matrix A = [ai, a 2 ,..., a^^] G are denoted 

by a^ and elements of a matrix (scalars) are denoted by 
lowercase letters, e.g., (or [AJ^^^). 

Subtensors are formed when a subset of indices is 
fixed. Of particular interest are mode-n fibers (vectors), 
defined by fixing every index but n, and slices which 
are two-dimensional sections (matrices) of a tensor, 
obtained by fixing all the indices but two, see Fig0 

The process of unfolding flattens a tensor into a 
matrix 0. In the simplest scenario, mode-n unfolding 
(matricization, flattening) of tensor X G x ^2 x ••• x/jv 



Fig. 1: Fibers, slices, and unfolding of a 3rd-order tensor. 

yields a matrix G MAx(7i---/n-i/n+i-"/iv) whose 

columns are all the mode-n fibers arranged in a specific 
order (in this paper rows and columns are ordered 
colexicographically). See Fig0for an illustration. 

The mode-n product of the tensor X G M^ix - x/iv 
and a matrix B G is the tensor C = 

X B G ]^^ix---x7^_ixJx/n+ix-'-x/AT with ontrios 

+ ~ = This can 

be also expressed in a matrix form as C(^) = BX(^). 

The Kronecker product (g) and the Khatri- 
Rao product © of matrices are defined in 

TABLE Moreover, we define (8)p^n ~ 

A^^) © • • • © • • • © and similarly 

define Op^„A(p). 

See Table [T] for the other useful notations. We refer 
readers to @.0 for more details about various tensor 
operations. 

2 Two-way Blind Signal Separation 

Blind source separation (BSS) is an unsupervised learn¬ 
ing method with the aim of estimating R source signals 
B G from the measurement matrix X G 

such that im , il2| 

R R 

X = AB^ = ^a,b^ = 5:a.ob„ (1) 

r=l r=l 

where the outer product o is defined in TABLE 0 
A = [ai, a 2 ,..., G is the unknown mixing 

matrix (also known as the basis matrix or dictionary, 
depending on application) and B = [bi, b 2 ,..., bi?] G 
I^TxT? matrix of sources b^ (components or la¬ 

tent variables). Although we explicitly assume that the 
columns of B are the latent variables, please keep in 
mind that the roles of A and B can be exchanged if we 
consider X^ in 0. 

Without any constraint, the matrix factorization prob¬ 
lem 0 is in general highly undetermined as it actually 
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TABLE 1: Notations And Definitions 


br/ ^m,r 

trace(B) 
krB 
X, X(,) 

®, 0 

C = A 0 B 

C = AQB 

e = X Xn B 


X = o b^^) o • • • o b^'^) 


Matrix, the rth-column, and the (m, r)th-entry of B. [B]m,r = ^m,r- 
Transpose, trace of matrix B 

Kruskal rank of B is the maximum value ensuring that any subset of kre columns is linearly independent. 
Tensor, the mode-n matricization of tensor X. 


Frobenius norm of B G i.e., ||B||^ = (Em r ^mr)^ • ^or tensor X, 

Element-wise (Hadamard) product, division of matrices or tensors. 

Kronecker product of A G and B G M^ 2 xJ 2 yields C = G 




aj G 


I1I2 X 
JXin 


IIf = I|X(„)||^, Vn. 
M^i^2xJiJ 2 with entries 
b 2 ••• bj] G M^2XJ 

B G 


Khatri-Rao product of A = [ai a2 

yields a matrix C = [ai 0 bi a 2 0 b 2 • • • aj 0 bj] G ^ 

Mode-n product of tensor X G x ■ • • x /at and matrix B G 1 
1^/1 X ••• x/^_i X X ••• x/jv with entries Ci 

and matrix representation = BX(^). 

Outer product of vectors b^’^) G yields a rank -1 tensor X = b^^^ ob^^) o • • • ob^-^) with entries 




yields tensor C = X x 

*Ar ~ EE=1 ^hd2,---diV 


has an infinite number of solutions. However, with very 
little a priori information, BSS enables the recovery of 
sources in the sense that 

B = T^(X) = BAP, (2) 

where B is an estimate of B, denotes a suitable BSS 
algorithm, A is a diagonal scaling matrix and P is a per¬ 
mutation matrix, denoting the unavoidable scaling and 
permutation ambiguities, respectively |^. This feature 
is called essential uniqueness of BSS. From a more 
general sense, 0 essentially provides a representation 
or transformation of the data matrix X that preserves 
as much of the information as possible. Different types 
of information to be preserved lead to various BSS 
criteria and methodologies. Below we briefly introduce 
the most popular ones. 

Principal component analysis (PCA). PC A transforms 
data matrix X into a set of uncorrelated variables, called 
the principal components using either eigenvalue or 
singular value decomposition (EVD/SVD). A common 
use of PCA is for dimensionality reduction where only 
a set of principal components is retained to preserve the 
maximum variance in the data. In PCA, uniqueness is 
achieved by imposing orthogonality on the transform 
matrix. 

Independent component analysis (ICA). If we rewrite the 
matrix product in Q using the random vector notation 

x(t) = As(t), t = l,2,...,T (3) 

thus interpreting each row of X as T-dimensional re¬ 
alization of the entries of the random vector x(t) and 
the rows of B^ in Q as the T-dimensional realization 
of the entries of the random vector s(t), then we can 
pose the problem as one that estimates a demixing 
matrix W such that the rows of Wx are as independent 
as possible. In other words, the demixing matrix W 
is expected to be the pseudo-inverse of the mixing 
matrix A, neglecting the scaling and permutation am¬ 
biguities. With this principle, we need a criterion to 
measure the statistical independence, and the most 


natural criterion is the maximization of independence 
through minimization of mutual information which can 
be shown to include others such as maximum likeli¬ 
hood and maximization of non-Gaussianity [[T2|-p4|. 
Another approach is to explicitly estimate higher-order 
statistics such as cumulants to establish independence 
using joint diagonalizations as in joint approximate 
diagonalization of eigenmatrices (JADE) p^ . ICA can 
be viewed as a generalization of PCA in the sense that 
it requires to venture beyond second-order statistics 
to incorporate higher-order statistical information [ [T^ [. 
ICA is attractive because by only using the natural 
assumption of statistical independence, it enables an 
essentially unique decomposition without the need for 
any additional constraint. For example, when both non- 
Gaussianity and sample dependence are taken into 
account in the formulation, any source, including those 
that are Gaussians can be identified subject to permu¬ 
tation and scaling ambiguities as long as the Gaussian 
sources do not have proportional covariances | [T^ . 

ICA has proven to be very useful for functional MRI 
(fMRI) data analysis, and can be performed in two dif¬ 
ferent ways |[^-|^: namely spatial ICA that extracts 
unique independent spatial maps, and temporal ICA 
that extracts independent time courses by considering 
X^ in Q. Although they are essentially just two differ¬ 
ent ways for organizing the data, spatial ICA is more 
useful as the spatial independence assumption is well 
suited for the systematically non-overlapping nature 
of the spatial patterns for most cognitive activation 
paradigms |^, p^ . 

Sparse component analysis (SCA). Sparsity is a natural 
property of many real-word signals, for example, ac¬ 
tivities of our brain, eye movements, and heart beat 
among others. SCA is a technique that allows us to 
recover latent source signals by exploiting the sparsity 
of source signals and/or the mixing matrix; and it 
is even able to separate dense signals, because some 
smooth dense signals often can be very sparse in a 
transform domain pO) . For example, EEG signals are 
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much more sparse in the frequency domain than in the 
time domain pA\ . In this case, BSS can be performed in 
the transform domain using SCA and then the outputs 
can be inversely transformed to construct the original 
source signals. SCA is particularly important because, 
by incorporating compressed sensing |^j, it allows the 
solution of underdetermined BSS problem, i.e., when 
there are fewer observations than sources (/ < R) in 0- 
Before we can do this, of course, we need to know the 
mixing matrix A, which is often estimated by clustering 
the observations using the sparsity assumption on the 
sources j^, or by blind identification methods utilizing 
the statistical independence of sources | [^ , 
Nonnegative matrix factorization (NMF). In power spec¬ 
tral analysis for EEC data, the latent frequency com¬ 
ponents are expected to be nonnegative and generally 
very sparse. In this scenario, NME proves to be par¬ 
ticularly useful as it imposes nonnegativity on both 
the mixing matrix and the source signals | [^ . Due 
to the nonnegativity constraints, the data matrix is 
represented as purely additive combinations of basic 
building blocks while without any subtractions, en¬ 
dowing NME with the special ability of learning-parts 
While NME achieved great success in machine 
learning, lack of uniqueness guarantee in the general 
case poses difficulty when it is applied to BSS. Eor- 
tunately, by imposing additional sparsity—even very 
weak sparsity—on the sources, NME is able to pro¬ 
vide unique factorization 1^, 1^. Typically, under the 


pure-source dominance condition |Q, that is, for 
each signal there exists at least one instant at which 
only this signal is active or strongly dominant, NME can 
be a powerful tool to separate statistically dependent 
nonnegative sources ||^, j^. NME based on the pure- 
source dominance assumption is called separable NME, 
which is highly scalable and can be applied to discover 
the most representative members in a large database 
|3T1. 


If we further require each row of B to be pure-source 
dominant, i.e., the whole matrix B to be a sub-matrix of 
a permutation matrix, we obtain the orthogonal NME 
model. Orthogonal NME is quite useful as it proves to 
be equivalent to A-means clustering j^, where 
the columns of A are the cluster centroids and B is the 
cluster indicator matrix. This representation is certainly 
unique as it meets the strongest form of the pure- 
source dominance condition. In summary, NME and its 
various variants have been powerful and versatile tools 
in machine learning and signal processing. 

Smooth Component Analysis (SmCA). Smoothness is 
another important intrinsic property that provides ad¬ 
vantages, especially for biomedical image and video 
processing. In SmCA, we impose smoothness con¬ 
straints not on the raw data X, but rather on the hidden 
components, i.e., vectors of the factor matrix B, and/or 
on basis vectors of the mixing matrix A. SmCA can 
be implemented by solving the following optimization 
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Eig. 2: Illustration of how to generate high-order tensor 
data for EEC data analysis, by incorporating the time- 
frequency representation (TER) of each channel. 


problem |Q, [ [^ : 


mm 

A,B 


X-AB' 


|2 
I F 


■7i||LiA| 


2,1 


■72 


IlLoBl 


2,1 ’ 


( 4 ) 


where H-H^ is the Erobenius norm defined in TABLE 
^ 2,1 matrix norm is defined as ||Y ||2 ^ \\yt \\2 ^r 

Y G 7 i ,72 are nonnegative penalty parameters 

that control level of smoothness, and matrices Li,L 2 
represent difference operators, typically, the matrix with 
all its entries being zero, except 1 on its diagonal and 
— 1 on its superdiagonal 0 0, |35). 


After we build a proper objective function, BSS is 
finally achieved by solving an optimization problem. 
While many standard optimization approaches can be 
applied, some typical methods that are especially suit¬ 
able for BSS are briefly introduced here. Taking into 
account that the mixing matrix in ICA is full column 
rank and hence has a certain underlying structure, the 
concept of natural gradient was proposed and proved 
to be the steepest direction p6) , | [37| . On the other hand, 
for NME and SmCA, alternating descent methods are 
quite useful since more than one parameter matrix need 
to be optimized. In these types of methods, each time 
the optimization is performed over a subset of param¬ 
eters (typically, one factor matrix or even only one col¬ 
umn of it) with the others fixed Q, |[3^. The alternating 
direction method of multipliers (ADMM) | [3^ , which 
has been enjoying growing popularity recently and is 
well suited to large-scale optimization problems, also 
falls into this category. Basically, alternating descent 
is rather a principle than a concrete method and has 
been widely applied to solve large-scale optimization 
problems that arise in machine learning and related 
areas | |3^ . 
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3 Multiway Data Analysis 

So far the data we considered were represented as 
a single matrix, which has only two dimensions, or 
modes. In many applications the data can be naturally 
represented as high dimensional arrays, i.e., tensors. 
For example, in EEG, each trial can be formed by a 
matrix with two modes of channels (electrodes) and 
time, and multiple trials form a 3rd-order tensor of 
channel X time X trial. If we consider the time-frequency 
representation of each channel, we obtain a 4th-order 
tensor of channel x time x frequency x trial, see Fig|^ To 
analyze such type of data, we may simply unfold them 
to form a large matrix and then apply the variety afore¬ 
mentioned BSS methods. However, this often causes 
loss of internal structure of high-order data. It is more 
desirable to take the tensor structure into account when 
we analyze such data. 


3.1 Basic Tensor Decomposition Modeis 

The Tucker Decomposition. Given an A^th-order tensor 
X G MAx/2 - x/iv^ Tucker decomposition, it can be 
represented as 0-0 

X = g Xi X2 B(2) ... xjv bW 

= [g;B(i),B(2),... ,bW1, 


where B(") = 


■■■ b^ 

mode-n factor component matrix consisting of latent 
components as its columns, S G M^ix ^2 - xi?7v 
is the core tensor reflecting the connections (or links) 
between the components, and {Ri, R 2 ,..., Rn) with 
Rn = rank(X(^)) is called the multilinear rank of X. 
Fig|^a) illustrates the Tucker decomposition of a 3rd- 
order tensor. 

Unconstrained Tucker decomposition is generally not 
unique, because an equivalent representation can be 
obtained by replacing with and S with 

Sx^Q-i inig, where Q is any Rn-hy-Rn nonsingular 
matrix. In other words, estimates the range of 

X(^). Hence, we often apply truncated SVD to mode- 
n matricization X(^) for n = 1,2,...,A^, in order to 
obtain orthogonal factor matrices This method is 
well known as the high-order SVD (HOSVD) and has 
been widely applied to multidimensional data analysis 

The canonical polyadic (CP) decomposition (also known 
as PARAllel FACtor analysis (PARAFAC) or CANonical 
DECOMPosition (CANDECOMP)). If we restrict the 
core tensor S in 0 to be diagonal with Ri = R 2 = • - = 
Rn = R, we obtain the CP model that decomposes 
the target tensor into the sum of rank-one terms (see 

Fig0»): 




P I-n X Rn 


is the 


X = YC) b(l) ob^2) ^ 

= [A;BW,B(2),..,,bW1, 


( 6 ) 
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(b) CP model 

Fig. 3: Illustration of the Tucker decomposition and the 
CP decomposition for 3rd-order tensors. 


where o denotes the outer product of vectors, and A = 
diag(Ar). The minimum number of R is called the rank 
(or explicitly, CP-rank) of X Q. 

Using the mode-n matricization operator, 0 and 0 
can be rewritten in matrix factorization forms as 


XT) =A(")b(”)^, n = l,2,...,X, 


( 7 ) 


where = 




b(u 




for the Tucker de¬ 


composition, or 

decomposition. Both models are multilinear models, 
and we often update the factor matrices in an alternat- 

Toblems 
for 


ing fashion by solving N least squares (LS 


X(^) -B(^)A(^)“ 


see 


§- 



3 , 38 


44 


miriACn) 

details. 

Tensor decompositions have been widely applied 
to brain signal processing j^, l|^-||4^- 1^/ the 

CP decomposition was employed to extract time- 
frequency-space atoms to identify the different signal 
components of the EEG data, which renders tensor 
decomposition an efficient approach for processing EEG 
data. In ||^, |Q, tensor-based EEG preprocessing was 
introduced to perform localization of brain sources 
based on EEG measurements, which can provide im¬ 
proved performance in the case of several simultane¬ 
ously active brain regions and low signal-to-noise ra¬ 
tios. In 1^, a tensor toolbox for event-related potentials 
(ERP) analysis was developed, which includes multi¬ 
way decomposition methods and visualization of time- 
frequency transformed ERP data. The group analysis 
can be performed naturally by tensor representation of 
brain activities recorded from varying subjects, condi¬ 
tions or repeats. Based on several strategies to represent 
EEG in a tensor format, tensor decompositions, espe- 
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cially, nonnegative tensor decompositions, are applied 
to extract the dominant sources of brain activity in 
an unsupervised manner, which can also be used for 
analysis of group differences between subjects. 

As a natural extension of BSS, multiway BSS is an 
attractive and promising approach because it allows 
components to be simultaneously extracted from differ¬ 
ent domains (i.e., modes) of tensor data. For example, 
in EEG data analysis, via multiway BSS we can extract 
the spectral components, the temporal components, and 
the spatial components simultaneously, together with 
links between them, reflected by the core tensor. As 
essential uniqueness is one key feature of BSS, next 
we discuss unique tensor decomposition models and 
associated algorithms. 

3.2 Multiway Blind Source Separation 

As the Tucker decomposition is not unique in general 
sense, we often need to impose additional constraints 
on the factor matrices and the core tensor to obtain a 
unique and physically meaningful representation. Com¬ 
paring (0 with Q, the columns of are just linear 
mixtures of the columns of This motivates us to 
extract by applying BSS algorithm to directly, 
which has been discussed in ||^| in terms of 

multilinear 1C A and multiway BSS. Multiway BSS can 
be considered as an extension of the high-order SVD 
where orthogonality constraints on factors are replaced 
with the other ones such as statistical independence, 
nonnegativity, sparsity, smoothness, depending on a 
priori knowledge about the components Q, |l^. As 
constrained Tucker decompositions, multiway BSS is of 
particular interest in practice because it provides essen¬ 
tially unique components that have specific statistical or 
deterministic properties (e.g., statistical independence, 
smoothness, sparseness and nonnegativity [?]). Partic¬ 
ularly, different from 2D BSS, multiway BSS is able to 
accommodate all these diverse constraints or properties 
of factors in one Tucker model, allowing desirable 
interpretation of data and flexible feature extraction. 

The CP decomposition can be interpreted as the 
Tucker model with additional structure constraints, and 
hence is essentially unique under the well-known suf¬ 
ficient condition [ |S4] |: 

N 

J2^n>2R+{N-l), (8) 

n=l 

where Kn is the Kruskal rank of B^’^^ (see TABLE [^, and 
R is the rank of X = |A; B*^^\ B^^\ ..., More re¬ 

laxed conditions can be found in This uniqueness 
analysis is purely based on multilinear algebra while 
neglecting the physical attributes of factors. 

One of the most important applications of the CP 
decomposition in BSS is blind identification of the 
mixing matrix. If we consider the ICA problem defined 
in the cumulant (can also be any other high-order 


statistics) of x just forms a higher-order symmetric 
tensor 

R 

Cx = ^ ^ Cr o o • • • o , (9) 

r=l 


where is the cumulant of s^(t), and a^ is the rth 
column of A. As a result, the mixing matrix A can 
be estimated via the CP decomposition of Cx, and the 
identifiability is guaranteed by the uniqueness of the 
CP decomposition of Cx- This approach is attractive 
because it is able to estimate the mixing matrix of even 
underdetermined mixing systems [ [T2| , [T^ , | [^ , 

If the sources have certain temporal structures, e.g., 
each source signal Sr is either a deterministic ergodic 
sequence or a stationary process |Q, j^, the covari¬ 
ance matrices of x(t) satisfy 


Cx(t) = A diag(cs(T)) A"^, t = 0,1,2,.. 


( 10 ) 


where r is a time-lag, and diag(cs(r)) is a diagonal ma¬ 
trix whose diagonal entries are Cs^{r) = E[s{t + r)s(t)], 
see for details. It can be seen that is just 
the joint approximate diagonalization (JAD) of a set of 
symmetric matrices Cx(t), r = 1 , 2 ,..., and has been 
extensively studied within the last two decades, see 
e.g., p8|-|[^. If we stack Cx(t), we can obtain 

a 3rd-order tensor C satisfying 

T 

e = ^arOarOCs{T), ( 11 ) 

r=l 

where s{t) is a vector of s{r), V r. 0 allows us to 
estimate A by performing the CP decomposition of 
tensor C. Again, the identifiability of A is guaranteed 
by the uniqueness the CP decomposition of tensor 6. 

By comparing (0 with 0. we can also incorpo¬ 
rate ICA to CP decompositions. A tensorial exten¬ 
sion of probabilistic ICA (PICA), named tensor-PICA, 
was proposed in [bT] for multisubject fMRI analysis. 
In tensor-PICA, assuming one component matrix, say 
has mutually statistically independent columns 
(components), a two-dimensional ICA criterion is used 
to update B^’^^ while the LS criterion is applied to 
update the other factor matrices, and all factor matrices 
are updated in an alternating manner. However, this 
method suffers from a divergence issue caused by the 
different objectives of the ICA step and the LS step 
In [ [62| a new method combining ICA and the 
CP decomposition was proposed, which overcomes the 
disadvantages of tensor-PICA but still suffers from 
high computational complexity. In a more efficient 
method was proposed, where one factor matrix 
was extracted by applying BSS to (see 0 ) at 

first, then, by virtue of the special Khatri-Rao prod¬ 
uct structure of A^’^^ = all the remaining 

factor matrices can be recovered by using a series of 
SVDs of rank-1 matrices. Particularly, once the matrix 
extracted by using BSS is of full column rank (which is 
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almost always true), all the other factor matrices can 
be uniquely identified, even if the general algebraic 
uniqueness condition § for the CP decomposition is 
not satisfied, which somehow extends the uniqueness 
analysis of the CP decomposition. In |[^ the robust 
CP decomposition was also proposed to separate un¬ 
derdetermined mixtures of event-related sources that 
include quasi-periodic signals (e.g., electrocardiogram 
(ECG)), sources with synchronized trials (e.g., ERP), 
and amplitude-variant sources. 

Note that under the nonnegativity constraints on 
the factors, the special Khatri-Rao product structure 
of in 0 often makes very 

sparse even if (p 7 ^ n) have weak sparsity. 
The sparsity of B^’^^ allows us to uniquely estimate 
B^’^^ and A^’^^ first by applying NME to X(^), and 
then recover all factor matrices B^^^ from p ^ n, 

which is an efficient way to perform nonnegative CP 
decomposition (also known as nonnegative tensor fac¬ 
torization). Nonnegativity is important for the CP de¬ 
composition because the optimal low-rank approxima¬ 
tion under nonnegativity constraint always exists [[6^ 
and is almost always unique | [66) , which is not true for 
unconstrained CP decomposition. Hence, nonnegative 
CP decomposition is a very promising way to perform 
nonnegative multiway data analysis 0 , H), 113. 

4 Linked Analysis of Multi-block Data 

Multi-block data typically include multiple measure¬ 
ments of the same phenomenon under various ex¬ 
perimental conditions. Eor example, EEC signals or 
fMRI data of the human brain in response to a certain 
stimulus, but from different subjects and trials, can be 
grouped together and naturally linked as multi-block 
data. Multi-block data may also refer to multi-modal 
data where different types of pulse sequences can be 
used in a single scanning session to form multimodal 
MRI datasets for each subject, or modalities such as 
fMRI and EEC might need to be considered jointly jS^ . 
All these possibilities pose the challenge to develop sys¬ 
tematic approaches for fusing these various data types 
together to automatically find common patterns—if 
they exist—of related changes across multi-block data 
|13}, (68|. Thus, linked component analysis is an 
emerging research topic with great potential in biomed¬ 
ical data analysis as it allows the extraction of not 
only shared common components across multi-block 
data but also individual components specific for each 
dataset. 

A number of models have been introduced to enable 
analysis of multi-block data by either defining hard 
links among matrices/tensors through shared factors, 
or by soft statistical links as in the case of independent 
vector analysis (IVA) and canonical correlation analysis 
(CCA). To define these models, first consider a set of 
data matrices X/^ G k = 1,2, ...,A. Examples 


include brain imaging data where each X/. is repre¬ 
sented as voxel X time points or voxel x subject. These 
data blocks are naturally linked as they are collected 
across a single shared mode (time T), and can be 
represented as in 0 for all K blocks (see Eig0 

Xk = AkBl,k = l,2,...,K. (12) 

In what follows, we first present models where the 
main assumption is the existence of common compo¬ 
nents across all the datasets and is imposed through 
the model directly as hard constraints, then in Sec¬ 
tion 4.2, we consider the use of soft links that take 
advantage of statistical relationship, typically in the 
form of statistical dependence as in IVA, CCA, and 
multiset CCA (MCCA), and finally introduce models 
that explicitly consider and model common as well as 
distinct components within each dataset. 


4.1 Models with common factors 

To investigate shared common information across 
multi-block data, e.g., common spatial maps, a number 
of models have been proposed p3|, [19|, |69|: 


Concatenation-based ICA models, as the name sug¬ 
gests, concatenate the data blocks either in the verti¬ 
cal dimension as in Croup ICA |16| or horizontally 
as in joint ICA [ [7Q| . In a simple concatenation, the 
assumption is that all observations are spanned by a 
set of exactly the same components. Eor example, as 
illustrated in Eig0 in 


X/e = A/cB^, /c = 1, 2, 




(13) 


B is shared by all data blocks whereas the mixing 
matrix G is associated with X/. only. By 

concatenating the observations vertically to form a tall 


matrix X = 


X2^ • 

■■ XT 

1 

G h)y<T^ 

have 








X = 

= ABT 


(14) 





T 


where A = 



.. 


Note that if hori- 


zontal concatenation is applied (assuming that Ik = /, 
V/c in Eig0, we simply transpose X, which leads to 
a model that assumes a common mixing matrix—the 
matrix A in —for the ICA decomposition. This is 
the model adopted in joint ICA where we have X/. = 
AB^, /c = 1, 2 ,..., A, and the common (mixing) matrix 
A is used to fuse multi-modal data such as fMRI and 
EEC represented as subjects by voxels and subjects by 
time points respectively ||^. In this application, the 
columns of the mixing matrix include subject covari¬ 
ations and when there are multiple groups, such as pa¬ 
tients and healthy controls, biomarkers can be identified 
through a t-test on these columns. Or, as explained in 
| [6^ , multiple blocks X/^ can be multi-task and multi¬ 
subject data of the same type and dimension and can 
be stacked either vertically or horizontally depending 
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Tucker decomposition 
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Ij^ — d, J — J , Vfc. 
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CP decomposition 
Gfc is a square 
diagonal matrix, 

Ak = Ae 

^ — L, J]^ — R^ V/n. 



Fig. 4: Decomposition models interpreted as linked component analysis of multi-set data: X/^ = A/^B^ = 
A/eG/cB^, where the latent components, i.e., the columns of B are assumed to be exactly the same for all data 
blocks. 


TABLE 2: Decomposition models for linked component analysis of multi-block data using common components 


Decomposition 

models 

Linked component analysis 

(fc = 1,2, 

Concatenation-based models 

joint ICA, group ICA, and variants 

Xj = BAj or Xfc = 

where B contains the shared common components, see jl3|. 

CP decomposition 

X/c = A Sfc B^, where A and B contain the shared common components, see (T^. 

Sfc is a diagonal (scaling) matrix, A can be an arbitrary matrix, or has a Khatri-Rao product structure, 
e.g., A = Ml 0 M 2 , which is equivalent to the CP decomposition of a higher-order tensor. 

Tucker decomposition 

X/c = A Gfc B^, A and B contain the shared common components, see (Tt). 

A is an arbitrary matrix, or has the Kronecker product structure, e.g., A = Mi C) M 2 , which is 
equivalent to the Tucker decomposition of a higher-order tensor. 


on the interpretation. An important step included in 
group ICA, which has been implemented in the GIFT 
toolbox (available at http://icatb.sourceforge.net), is 
double dimension reduction stages using PC A. If each 
X/j; represents a subject data such as fMRI (time points 
by voxels), then we first perform a subject level PCA, 
and after vertical concatenation of dimension-reduced 
subject data, a second level PCA is applied at the group 
level to estimate a common group subspace |[^, [ |7T] |. 
Individual subject maps are then reconstructed using 
the group and subject-level PCA matrices thus pre¬ 
serving most of the variability for individual subjects. 
Other implementations and applications of the group 
ICA model are also possible and discussed in | [6^ . 

Compared with Q, the concatenated ICA model de¬ 
scribed in (p^ is merely the partitioned version of stan¬ 
dard BSS Q, which is a popular technique to solve the 
computational issue in the case where X is extremely 


large in one dimension. In addition, this model allows 
for explicit modeling of common modes across e.g., 
specific trials or subjects. One major limitation of the 
joint ICA model is the strong assumption that all data 
blocks share exactly the same common independent 
components. The approach proposed in |72| uses CCA 
and MCCA such that the assumption of identical 
mixing matrix (hence subject covariations of |7Q|) is 
relaxed and instead maximally correlated subject co¬ 
variations are estimated. 


Tensor ICA stacks all the data blocks to create a 
3rd-order tensor X, rather than 2D matrices obtained 
by concatenation. This technique has been used for 
multi-subject fMRI analysis based on the CP model, 
for tensors with dimensionality of voxel x time x subject 
|[M), [[74|, [[75|. In this method, data for each subject can 
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be expressed as 

= Xfc = A diag(wfc) B"^ 

= [A diag(wfe)] B"^. 

Note that the mixing matrix associated with X/^ is 
Adiag(w/c), implying all blocks actually share the same 
mixing matrix but differently scaled by their individual 
weighting vectors This of course will lead to a more 
restrictive model in comparison with ([^, which can be 
clearly observed from the relationship 

xT) = (W0 A)BT, (16) 


where W = 


Wi 


W2 • • • V/R 


e X( 2 ) is the 


mode-2 unfolding of the data tensor X, (W 0 A) serves 
as the mixing matrix and has a Khatri-Rao product 
structure, and A captures the latent spatial maps with 
additional independence constraints. Note that X^^ is 
just the vertical concatenation of X/.. In other words, 
the major difference between the concatenation based 
ICA ( p3| ) and the tensor ICA is that: In the concate¬ 
nation based ICA, the mixing matrix is not constrained 
whereas in the tensor ICA it has a Khatri-Rao product 
structure. The authors of [ [74| | claimed that it can be a 
beneficial feature of the tensor ICA because it avoids 
unnecessary duplication of the spatial maps and can 
allow them to be inferred more accurately when the 
assumption holds. 

It is worth noticing that ([^ can be viewed as 
asymmetric joint diagonalization of matrices X/^, an 
extension of the symmetric JAD problem described in 
( [Tq| ), which shows that the CP decomposition in general 
is merely a technique that simultaneously diagonalizes a 
set of matrices (or more generally, tensors). Remem¬ 
bering the discussion in Section 2, SVD diagonalizes 
a single matrix with orthogonality constraints whereas 
the CP decomposition jointly diagonalizes multiple ma¬ 
trices but without additional constraints. Hence the CP 
decomposition is essentially a useful tool for linked 
component analysis of multi-block data that are of the 
same size and share common factors in each dimension. 
For example, in |^, the CP decomposition was used 
for shared processing of perception and imagery of 
music, in order to determine if both the spatial location 
and temporal evolution are shared between tasks, iden¬ 
tifying common networks and their timing signatures. 

A natural question that may arise is whether it is 
meaningful in practice to relax model ([^ to 


X,,,k=Xk = AGkB'^, (17) 


where G/c, associated with X/^, is not necessarily di¬ 
agonal any more. This interesting extension has been 
studied in [77) in terms of population value decom¬ 
position (PVD) and has been applied to discover com¬ 
mon features from thousands of similar images. This 
simple model allows to reduce a very large number of 
images to a manageable set of parameters and to make 


statistical inferences on samples of large-scale datasets. 
The authors demonstrated the potential of this method 
by applying it to analyze the sleep heart health study 
dataset. Moreover, the entries of matrix Gk can be used 
as predictors in a regression context, e.g., to predict the 
risk of Alzheimer's disease using fMRI | [77| . If we stack 
X/c to form a 3rd-order tensor X, and similarly for G/^, 
by using the Tucker model, (Tt) can be rewritten using 
tensor representation as (see Fig|^ 

X = gxiAx2B, (18) 

implying the multi-block data X/^ = share com¬ 

mon factors A and B, but with subject-specific infor¬ 
mation kept in Gk = 

In summary, ( p^ and (Tt) are special cases of the CP 
model and the Tucker model, respectively, which can be 
extended to higher-order tensors |78) and applied to the 
scenarios where multi-block data are of the same size 
and share common components in all modes. Moreover, 
the concatenated ICA models (group ICA and joint 
ICA), the CP decomposition, and the Tucker decom¬ 
position all perform linked analysis of multi-block data 
with the assumption that all samples are dominated 
by the common components, and with the exception of 
joint ICA that uses horizontal concatenation, all models 
assume the blocks to be of the same size. We summarize 
these models in TABLE |2l 


4.2 Models with statistical links: IVA, CCA, MCCA, 
PLS, and their tensorial extensions 

The methods discussed in Section 4 assume that the 
data blocks share exactly the same latent (indepen¬ 
dent) components, which could be too restrictive for 
some applications and lead to suboptimal solutions. 
In contrast, IVA [ [l3| , [ [7^ |, CCA | [80| and its multiset 


extension MCCA [73], as well as partial least squares 
(PLS) are more flexible methods, and provide advan¬ 
tages especially when not all modes are common across 
the blocks. It has been shown that MCCA can be used 
to perform BSS fusion of EEC, fMRI and structural 


MRI data [72|, and for brain computer interface (BCI) 
1^ , and IVA for multi-subject fMRI analyi s |[T3| , arti¬ 
fact rejection in concurrent EEG-fMRI data |83| among 
others. In what follows, we first briefly review matrix 
decomposition using IVA, discuss how it generalizes 
CCA and MCCA—like the generalization of PCA using 
ICA—and then discuss the multilinear extensions for 
CCA and PLS @. 

Independent Vector Analysis (IVA) can be formulated by 
considering the model in ^ for K datasets such that 


x/c(t) = AkSk{t), /c = 1, 2,..., A. 


(19) 


Hence, in this model, as opposed to those introduced 
in Section 4.1, all datasets are assumed to have their 
own unique mixing and source/component matrices. 
The link across the datasets is achieved in a "soft 
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manner" by defining an optimization criterion that 
fully takes advantage of the statistical dependence that 
exists among the multiple datasets. The key definition 
in the formulation of IVA is the source component 
vector (SCV) that is formed by using the corresponding 
elements of the source random vectors Sk{t) G 
such that an SCV is given by [si^i{t), S 2 ,i(t),..., 
for i = 1,2,where the second subscript refers 
to the source index for each dataset in ( p^ . The IVA 
decomposition is achieved by minimizing the mu¬ 
tual information among the /, each A-dimensional 
SC Vs, as opposed to sources in ICA ||^, ||^. Hence, 
IVA achieves a decomposition of independent sources 
within each dataset while estimating SC Vs that are 
dependent within its entries (see Fig|^. It is this de¬ 
pendence that helps with the mitigation of permutation 
ambiguity for sources (modes) across the datasets, and 
hence enables IVA to identify sources that are indepen¬ 
dent and identically distributed Gaussians as long as 
they are correlated across the datasets. Complete iden¬ 
tification conditions and bounds are given in [ [7^ [. By 
properly aligning corresponding independent compo¬ 
nents from each dataset—whenever they are related— 
IVA enables a convenient and useful way to perform 
multi-subject analysis as in fMRI and allows one 
to exploit dependence for other uses such as in artifact 
rejection [ [83| . It has been shown that IVA is able to 
preserve subject variability compared to group ICA 
both by simulations ||^, ||^ and also using real data 
where there is significant variations among the subject 
fMRI scans, such as in patients with stroke | [8^ and 
schizophrenia | [^ . 

When the SCV distribution is assumed to come 
from a multivariate Gaussian distribution, it can be 
shown that IVA cost function becomes equivalent to 
one of the five cost functions introduced for MCCA 
(T^ , Its) . In 1^ , maximization of correlation across 
elements of an SCV, and hence of multiple datasets, 
is achieved by making the covariance matrix of an 
SCV as ill-conditioned as possible through five cost- 
functions introduced by heuristic arguments, and each 
SCV is thus estimated in a deflationary manner. All five 
cost functions reduce to CCA when there are only two 
datasets. MCCA has been applied to both multi-modal 
data fusion and multi-subject fMRI analysis |Q, | [M) 


and hence IVA can be considered as a generalization 
of MCCA to the case where higher-than second-order 
statistical relationship among the datasets are taken 
into account by selecting an appropriate multivariate 
density model for the SCV [ [T3| [. 

Multilinear extensions of CCA and PLS. Suppose we 
have K pairs of sample instances {(X/c^/c), k = 
1,2, ...,A}, with Xk G M^ix^ 2 x-x/iv ^ 

]^Jix J 2 X - X Jm^ which can be concatenated to form two 
large tensors with additional sample dimension: X G 
^Iix---xInxk ^ ^Jix---xJmxk ^ yVe Seek projection 

vectors such that the correlation between the multilin¬ 
ear projections of X and ^ on them is maximized: 




u^v 


Vu^l 


( 20 ) 


where 


u = X(Ar+i)Wx, v = Y(M+i)Wy, (21) 

wx and wy are just the vectorizations of rank-1 tensors 
Wx and Wy, respectively, and 

(1) (2) (N) 

Wx = wV ^ o wV ^ O • • • O wV ^, 

Ji Ji Ji ' 

(1) (2) (M) 

Wy = Wy o Wy ^ O • • • O Wy . 

The vectors u and v are called the first pair of canonical 
variables, reflecting the common information shared 
by the two groups X and y. It turns out that ( [^ 
can be easily solved via HOSVD, by applying the La¬ 
grange multiplier method to ( |2Q| . Then after deflation, 
by solving ( [^ we seek the second pair of canonical 
variables, and so on as in MCCA. Note that the canon¬ 
ical variables can cross multiple modes of X and y, 
as long as they are of the same size in these modes. 
Multilinear CCA has been extended to the kernel space 
and applied, for example, to behavior data classification 
(84). 

Basically, CCA enables extraction of components that 
are highly correlated and is widely applied for feature 
extraction. In contrast, PLS is a similar well-established 
framework with the aim to predict a set of depen¬ 
dent variables (responses) from a set of independent 
variables (predictors) through the extraction of a small 
number of latent variables. It solves the following op¬ 
timization problem: 

max{wx,wy} u’^v, (23) 

where u and v are defined as in ( [21) . Compared with 
( |^ , it can be seen that besides correlation, the energy 
(variance) of u and v plays a key role in PLS. Again, ( |23) 
can be solved by HOSVD. Once u^v is maximized, we 
replace v in ( [21) by u that can be expressed by using 
X and Wx- As such, a connection between X and ^ 
is built, which allows us to predict from a given 
new sample In practice, we often extract multiple 

pairs of vectors to improve the prediction accuracy, 
following a similar routine as for the multilinear CCA. 
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Two standard tensor decomposition frameworks (i.e., 
CP and Tucker) are both considered as the counterpart 
of SVD for tensorial data. The multilinear CCA and PLS 
described in (|2Q|)-(|23|) are based on the CPD framework. 
Therefore, an alternative strategy is to apply Tucker 
decomposition in these methods, which results in a 
more flexible model in terms of fitting ability. For 
instance, multiway PLS based on Tucker model, called 
HOPLS ||^, can be described as 

max{S:,.wx,Sa,wv} (24) 

where u and v are the same as in ( [21) , but wx and wy 
are the vectorizations of Wx and Wy, respectively, and 

Wx = Sx Xi X 2 X ... W^, 

Wy = Sy Xi X2 X • • • Xm 

Hence, ( |22| ) can be considered as a special case of ( |25| ); 
and HOPLS is a generalized framework for multiway 
PLS, which has shown the promising performance for 
decoding of movement trajectories of a monkey's hands 
based on ECoG signals ||^, [ [^ . Similar strategy can be 
also extended to multilinear CCA. 


4.3 Common and Individual Feature Analysis 
(CIFA) 


The models discussed in Section 4.1 assume that a 
certain set of components are exactly the same across all 
datasets, on the other hand, those discussed in Section 
4.2 rely on statistical correlation (second and higher- 
order) to link the multiple datasets. Another approach 
is to explicitly model the common and distinct com¬ 
ponents, which is expected to be a better match to 
most physical systems, and hence lead to substantially 
improved performance. For example, if we analyze the 
human EEG signals in response to a certain stimulus, 
but from different subjects and for many trials, it is 
expected that these data would share some common 
components due to the application of the same stimu¬ 
lus while also possessing components specific to each 
individual. To address such scenarios, a more general 
linked matrix factorization model was recently studied 
in terms of a joint and individual variation explained 
(JIVE) [ [^ [ and common and individual feature analysis 
(CIFA) models [[^[-|[^. In these methods, a set of 
matrices A’ = {X/. G : k = 1,2, ••• ,A}, share 

at least one common dimension T. They can be, e.g., a 
set of multichannel EEG signals (channel x time point) 
recorded for the same visual stimulus but from different 
subjects. These data matrices can be factorized in a 
linked way as follows (see Eig|^a)) 


X/, = A/cB^ = [Ak Ak] 






=A/eB 




(26) 


where B G B/^, G Rj. = Rf. C is the 

number of latent components with R^ < h and C is 
the number of common components, Ak and Ak are the 
partitions of the mixing coefficients Ak corresponding 
to B and B/^. In this way, each data matrix X/^ is 
represented through a combination of components from 
the common (shared) subspace X/. = A/cB^ and the 
individual (intrinsic) subspace X/^, = Ak^J. Basically, 
the common information carried by the common sub¬ 
spaces X/c can be used to identify the whole group 
of coupled data, whereas the individual subspaces X/. 
can be used to identify the data member X/. as it 
is presented only in X/.. The objective is to estimate 
the matrix B representing common cornponents shared 
by all data blocks and the matrices B/. representing 
individual components associated with only the data 
matrix X/^. 

The objective of both JIVE and ClEA | [^ , 
is to solve this challenging problem we describe above, 
with the subtle yet important difference: JIVE quanti¬ 
fies the amount of joint variation between data types 
whereas CIEA captures highly correlated components 
while ignoring their energy (variances), which allows 
relatively weak common components to be detected 
and extracted. Eor this reason, JIVE can be viewed as 
the extension of PLS while CIEA is more closely related 
to CCA. CIEA advances this topic by introducing BSS 
methods to the analysis of each common/individual 
subspace, motivated by the link between 0, i.e., 

by defining 

Xk = A/eB^, Xk = AkBj. (27) 

This formulation allows us to separate common and 
individual components that are physically easy to in¬ 
terpret. Hence CIEA enjoys significantly improved flex¬ 
ibility and has demonstrated numerous potential appli¬ 
cations in clustering analysis and pattern recognition 
1^ . This paradigm is illustrated in Eig[^ 

If each data block is a high-order tensor rather than 
a matrix, more sophisticated models are required to 
perform comprehensive linked analysis. Eor simplicity, 
at first we focus on the common and individual feature 
analysis with respect to only one mode, and then we 
briefly discuss its extension to multiple modes. To avoid 
cumbersome notations, sub- and super- scripts, we also 
assume that the tensors are of order 3, but please keep 
in mind that they can be of any order and may also 
have different sizes—as long as they share at least one 
common mode, from which common and individual 
components will be extracted. Consider a set of tensors 
X={Xi,X2 ,... that share a common mode, say, 

mode-1. Suppose Xk ^ R^^^k, 2 xik, 3 ^ where Xk can also 
be a matrix with Ik ^3 = 1, V/c. Using the Tucker model, 
each tensor can be decomposed as 

Xfe = Sfc Xi X2 Bf X3 Bf, Vfc, (28) 




12 


PROCEEDINGS OF THE IEEE 



Fig. 7: The paradigm of common and individual feature analysis (CIFA) for multi-block matrix data analysis. In 
CIFA, the common orthogonal basis extraction (COBE) approach can be employed to separate the common 
and individual subspaces. Then, BSS can be respectively applied to the common and individual subspaces, to 
achieve common feature (i.e., F) extraction and individual feature (i.e., F/., /c = 1,2,..., A) extraction. 


where S/c E R^k,i^Rk,2y<Rk,3 jg ^he core tensor of X/^, 
and n = 1,2,3, are the corresponding component 
matrices. Assuming that the mode-1 factor matrices 
share some common components such that b[.^^ = 
], we obtain the following linked Tucker 
decomposition model: 

Xfc =Sk XiB(i) X2Bf^ XsBf 

+ Sfc XsBf XgBf (29) 

=X/c + V/c, 


where B*^^^ is shared by all data blocks whereas b[.^^ 
is associated with only, G R^^^k,2xRk,3^ ^ 

^{Rk,i-c)xRk,2y<Rk,3^ C is the number of common 
components in mode-1. 

For notational simplicity, we temporally denote the 
mode-1 matricization of X by i), and similarly for 
G(/j. 1), and G(/j. i); then can be rewritten in 

parallel with ( [^ as 




where k = 1, 2,..., A, and 


Ai'^ = (Bf 


(30) 


(31) 


The key difference between ( [SO] ) and lies in the 
multilinear structures of and shown in pT| . 

We call p^ or p0| CIFA-Tucker as the constrained 
Tucker decomposition model is a key model of this 
linked component analysis. To solve this problem, we 
may apply the ALS method used in JIVE, but with 
additional multilinear structure constraints imposed on 
and A^j^\ Alternatively, we can apply the two-step 
method proposed in [ [^ : in the first step dimensional¬ 
ity reduction can be performed to obtain p^ . Then, 
the rotational ambiguity associated with the Tucker 


model allows us to simultaneously rotate the columns 
of B^^^ and conversely the rows of G(/j. i) to achieve the 
separation of common and individual subspaces. Af¬ 
terwards BSS can be applied to respective subspaces to 
extract unique and physically meaningful common and 
individual components with specific properties, such as 
sparsity, nonnegativity, statistical independence. 

More generally, multi-block high-order tensor data 
may share more than one mode, each of which contains 
some common components. To model such a complex 
dataset, we can combine these modes to form a big 
mode that may contain some common components, by 
reshaping each data tensor Xk simultaneously using 
the technique introduced in [ [78) . As such, the matrix 
B^^^ in pQ| is replaced by the Kronecker product of 
common components coming from different modes. 
In other words, we can impose additional structure 
constraints upon B^^^ and b[.^\ This problem can be 
solved using the ALS ||^; implementation details are 
however omitted due to space limitation. 

5 Dimensionality Reduction for Incom¬ 
plete Data Corrupted by Outliers 

When we are ready to apply any specific method we 
have discussed to real-world problems, several practical 
issues arise. These include questions such as to how 
many latent components one should extract, and how 
to remove the noise and outliers from noisy and/or 
incomplete observation data. Particularly, multi-block 
high-order data may cause data deluge and pose new 
challenges for effective and efficient data analysis. A 
common approach to deal with these issues is di¬ 
mensionality reduction, which has been a ubiquitous 
technique broadly applied for noise reduction, data 
compression, and feature extraction. Indeed, most mod¬ 
els and algorithms we discussed above rely on efficient 
and robust dimensionality reduction. 
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TABLE 3: Comparison of popular dimensionality reduction models (c.f. |[^). Y G is the data matrix of 

T samples in /-dimensional space, X = AB^ is the extracted low-rank data, the sparse matrix E denotes the 
outliers. Laplacian matrix $ characterizes a simple graph or a hypergraph between the samples of Y | [^ , 1/2,i 
norm is defined as ||Y||2 ^ llytIl2- space limitation, most of their nonnegative variations are omitted 

here as they are straightforward, and their multiway extension can be found in |[^- |1QQ[ and references therein. 


Model 

Objective 

Constraints 

PCA 

minA.B ||Y-ABT||^ 

A^A = I 

RPCA 

95 


minx, E X k + A E ^ 

Y = X + E, X = ABT 

RPCA on graphs 

97 


minx, E XL: + A E + 7 trace(X^X"^) 

Y = X + E, X = AB^ 

Graph Laplacian PCA (GLPCA) 

101 


minA, B ||Y-ABT7+7trace(BT^B) 

B^^B = I 

Robust GLPCA 

101 


minA, B ||Y-AB^II^ ^+7trace(BT#B) 

B^^B = I 

NMF 1 

26 


minA.B IIY-ABT 7 

A and B are nonnegative 

Orthogonal Nh 

IF 1 

32 


33 


minA.B ||Y-ABT||^ 

A and B are nonnegative, and B^B = I 

Sparse component analysis 

miiiA.B ||Y-ABT||^ + A||B||i 

Various constraints, e.g. nonnegativity 

Smooth component analysis (SmCA) 

34 

minA.B ||X-ABT III, + 7l||LlA||2,i + 

72 L 2 B 27 

Various constraints, e.g., nonnegativity. 
Li,L 2 represent difference operators. 


The key to achieve dimensionality reduction is the 
low-rank nature of data that is essentially related to 
the complexity of the generative model. Estimating 
this rank, or equivalently, the number of latent com¬ 
ponents, is often a challenging task and referred to 
as the model selection problem. As an example, in 
brain data imaging we never know how many latent 
sources that contribute essentially to the measured data; 
different model order selections may lead to quite 
different latent components and, consequently, different 
interpretations. 

A successful model order selection largely relies on 
appropriate assumption on the distribution of noises. 
Gaussian noise is usually assumed in many imaging 
systems ||^, which often results in a least squares 
problem and the use of PC A introduced in Section 2. To 
deal with grossly corrupted observations, robust PCA 
(RPCA) was recently proposed and attracted consider¬ 
able attention. In RPCA, the noisy data Y is modeled 
as the sum of a low rank term X = AB^ and a sparse 
term E of outliers, leading to the following optimization 
problem [[^[: 


min rank(X) + 7 ||E||o , s.t. Y = X + E. (32) 

X,E 

However, ( |32| ) is a highly nonconvex optimization prob¬ 
lem that is difficult to solve; a tractable surrogate of 
( |3^ is obtained by replacing rank(X) by the nuclear 
norm ||X||^ = ai{X) and ||E||q by ||E||^. By solving this 
surrogate problem, both low-rank approximation and 
the outliers can be estimated. Detailed discussions on 
this topic can be found in and references therein. 

A closely related problem, known as matrix comple¬ 
tion (MC), also utilizes the low-rank feature of data and 
can be modeled as 


nhn rank(X), s.t. Xq = Y^, (33) 


where Q indicates location of available entries of Y, Xq 
is a matrix with = Xij if (i,jf) G ft; otherwise 

0. Apparently, if we know the location of outliers in 
( |32| ), problem ( |3^ is equivalent to ( |33| because we can 
simply mark the items corrupted by outliers as missing 
values. We refer readers to |[^| for a comprehensive 
overview, and here, focus on tensor completion. Tensor 
completion has many important applications: 1) it al¬ 
lows us to perform effective data analysis even if the 
data is incomplete or corrupted by outliers; however, 
these incomplete data are often simply discarded us¬ 
ing traditional techniques; 2) it allows us to predict 
the missing items based on partially observed values, 
which is known as collaborative filtering and has been 
widely applied in recommender systems; 3) it promises 
exact recovery of signals from very limited samples. 
This feature makes the completion technique some¬ 
what related to compressed sensing p^ . However, 
completion mainly makes use of the low-rank property 
whereas the compressed sensing employs usually the 
sparsity of latent signals under a specific dictionary. 

Extension of ( |32| ) and ( |33| ) to the tensor domain is 
rather straightforward if the Tucker model is used to 
model the data. Eor example, rank(X) can be simply 
replaced by the weighted sum of multilinear ranks of 
tensor X, and then outlier detection and completion 
can be achieved by minimizing the multilinear rank. 
Efficient variations and algorithms can be found in 
|1Q2| -| [TQ41 ; a recent trend is to handle both outliers 
and missing values using a unified framework 
Among these methods, the infTucker method 
assumes that the multilinear rank is given and the other 
methods can learn the multilinear rank but introduce 
new tuning parameters whose selection is often as 
difficult as the model order selection itself. 

The rank minimization framework faces big chal- 
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(a) CIFA for multi-block matrices 
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(b) CIFA for multi-block tensors 


Fig. 6: Illustration of common and individual feature 
extraction (CIFA). (a) The matrix case: the matrix B de¬ 
notes the common (or highly correlated) features shared 
by all data blocks while B/e are the individual features 
specific for each individual block, (b) The tensor case: 
CIFA is performed with respect to mode-1, where the 
matrix B^^^ denotes the common features while B^^^ 
are the individual features. The multilinear structure of 
S/j. makes the tensor model more general and flexible. 


lenges if the CP model is used to model the data, 
because even determining the CP-rank of a given 
tensor is NP-hard. If the CP-rank is given, polyadic 
factorization with missing values has been developed 
by employing CP weighted optimization (CPWOPT) 
| |105[ and nonlinear least squares (CPNLS) ]1Q6| . The 
Bayesian CP developed in I pTOT) can infer the latent CP- 
rank, but it suffers from the weak point that the missing 
data are handled in a heuristic way. Recently, a unified 
framework was proposed in ||^, where the given 


data tensor ^ (can be incomplete) is modeled as 


y = X + N+8, (34) 


X = |A; ..., A^^^], tensor AT is the Gaussian 

noise, £ denotes the outliers. To infer the latent CP- 
rank R, at the very beginning, R will be assigned a 
sufficiently large value and then it shrinks gradually 
during the iteration process by enforcing column-wise 
sparsity of factor matrices. This method is attractive due 
to its full Bayesian treatment which is able to simultane¬ 
ously handle outliers, Gaussian noise, and incomplete 
data uniformly with automatic model order selection. 
Particularly, it is free of any tuning parameters. 

As a side comment, we would like to emphasize that 
outliers are not always treated as noise. For instance, in 
video surveillance analysis, outliers can be used to iden¬ 
tify the activities that stand out from the background 
Therefore, an alternative interpretation of robust 
tensor factorization is to decompose an incomplete ten¬ 
sor as a low-rank tensor, capturing global information 
and predicting missing values, and a sparse tensor, 
capturing local information. Hence, the power of the 
methods we introduced in this section are not limited 
to optional pre-processing procedures. Instead, many of 
them can be applied independently for different types 
of applications. Dimensionality reduction itself is a very 
active research topic, many models and approaches 
have been proposed for different application purposes. 
While readers may find comprehensive reviews in 
]1Q8|| , we summarized some of them for quick reference 
in TABLED 


It can be seen that the low-rank nature of data is the 
foundation of the above models and approaches. This 
low-rank nature allows a very compact representation 
of large-scale structured raw data, upon which highly 
scalable and efficient algorithms can be developed [ |^ , 
]1Q9| |. To meet the ongoing challenge created by big 
data analytics, new data models and optimization al¬ 
gorithms are required. In this regard, tensor networks 
were recently proposed to achieve the so-called super¬ 
compression (see |11Q| , |111J and references therein). 
Randomization and sparsification are emerging tech¬ 
niques that provide favorable scalability for dimension¬ 
ality reduction. In the first category, the given huge data 
tensor can be compressed via either random projections 
|TT^ , | [TT3| or random sampling B- In the second 
category, the dense data tensor is converted to a sparse 
one via an element-wise sparsification algorithm |115| . 
In the authors argued that first-order methods, 

randomization, and parallel and distributed computa¬ 
tion are three key pillars and offer surprising scalability 
benefits for big data analysis. Readers are referred to 
ITTol , |116| and references therein for this ongoing 
research topic. 
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6 Illustrative Examples 

6.1 Applications of CIFA in SSVEP-based BCI 

Steady-state visual evoked potentials (SSVEP) are peri¬ 
odic neural responses elicited at the same frequency as 
a visual stimulus when a subject is excited at a specific 
frequency by the visual stimulus. An SSVEP-based BCI 
is designed to detect the target stimulus by recognizing 
the dominant frequency from SSVEP data, and then 
to translate it into a pre-assigned computer command. 
Although real SSVEPs present relatively stable spec¬ 
tra as long as the recording time is sufficiently long, 
they are often contaminated by background sources of 
noise. Thus, efficient and accurate recognition of SSVEP 
frequencies is essential to achieve high performance in 
SSVEP-based BCls. As one of the most widely adopted 
methods, CCA identifies each SSVEP target frequency 
by locating the maximum correlation between EEC 
recordings and a set of artificial references, each of 
which corresponds to a candidate frequency fn^ . 

While CCA gained popularity and provides reason¬ 
ably good performance in SSVEP frequency recogni¬ 
tion, it suffers from two major limitations: it ignores the 
multiway structure of EEC data and particularly, uses 
artificial references typically constructed by simple sine 
and cosine waves rather than real SSVEP features. Re¬ 
cently, by effectively exploiting the multiway feature of 
EEC data, a multiway extension of CCA (MWCCA) has 
been developed to improve the performance of SSVEP 
recognition The MWCCA approach collabora- 

tively maximizes correlation between multiple dimen¬ 
sions of EEC tensor data and the pre-constructed refer¬ 
ence sine-cosine waves at specific stimulus frequencies, 
thereby leading to improved performance against the 
standard CCA. On the other hand, a group of EEC 
data trials recorded at the same stimulus frequency 
should share some common features reflecting this fre¬ 
quency information. Such common features extracted 
from EEC training data are supposed to bear real 
SSVEP characteristics and hence are more qualified as 
references for SSVEP recognition (see Pig|^. With this 
intuition, MCCA was proposed to optimize the SSVEP 
references l^ , where the common features extracted 
from multiple training sets of EEC data recorded at 
the same stimulus frequency were chosen as the ref¬ 
erences. A solution that does not use any reference 
signal but makes use of the statistical dependence of 
SSVEP among multiple electrodes has been very re¬ 
cently proposed using IVA and has noted to provide 
quite promising performance ]119| provided that the 
time window is sufficiently long. 

Also, we investigated the application of CIPA to learn 
real SSVEP features in order to further improve the 
accuracy of SSVEP recognition. In this experiment, we 
validated effectiveness of CIPA in SSVEP recognition 
using the EEC dataset recorded from ten subjects. 
During the EEC recordings, four red squares were pre- 




Pig. 8: Illustration of SSVEP recognition based on mul¬ 
tiway or multiset learning models. Consider multiple 
tensors (channel x time x trial) A'l,..., constructed 
by the EEC data corresponding to K distinct target 
frequencies of /i,..., /x, respectively. The common fea¬ 
tures extracted by using MWCCA, MCCA or CIPA are 
used as references for SSVEP recognition. The SSVEP 
target frequency of new test data for a single trial is then 
recognized according to the maximum of correlation 
coefficients pi,..., Pic between the test data and the 
optimized references. 


sented on the screen as stimuli flickered with different 
frequencies. Each subject completed 20 runs. In each 
run, each stimulus was cued as the target followed 
by flickering of the four stimuli at four different fre¬ 
quencies: 6, 8, 9 and 10 Hz. Subjects were asked to 
focus their attention on the target for l-4s after each 
cue. A total of 80 trials data were acquired from each 
subject. More details about the experiment settings 
can be found in | |118| . Three state-of-the-art methods, 
i.e., CCA, MWCCA and MCCA were compared with 
CIPA. Pig|^ depicts the averaged recognition accuracies 
evaluated by leave-one-run-out cross-validation, from 
which we see that CIPA achieved much higher average 
accuracy than the others for all time windows from 0.5s 
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Time window (s) 

Fig. 9: SSVEP recognition accuracies averaged on sub¬ 
jects, derived by CCA, MWCCA, MCCA and CIFA 
respectively, over a time window from 0.5 s to 4 s (0.5 
s interval). 

to 4s. 


Noisy MRI (cr = 10%, SNR = 7dB) 



Denoised MRI (PSNR = 36dB) 



Fig. 10: MRI tensor is of size 181 x 217 x 165. The cube 
size is 4 X 4 X 4 and the number of similar cubes is 30. 


6.2 MRI denoising 

MRI is a non-invasive high resolution multi-parameter 
imaging modality that is widely used in current clinical 
diagnosis and scientific research because of its ability to 
reveal the 3D internal structures of objects. However, 
MRI may suffer from serious effects such as random 
noise, specifically when high-resolution and high-speed 
is required. The denoising of MRI is thus important 
to improve the inspection quality and reliability of 
quantitative image analysis. Nonlocal filters by ex¬ 
ploiting similarity and sparseness among cubes have 
been demonstrated to be effective in denoising MRI. 
Recently, HOSVD has been applied to image denoising 
by exploiting redundancy in the 3D stack of simi¬ 
lar patches ||^. In addition, the augmented HOSVD 
method has been investigated and applied to MRI 
volume data |120|. One common limitation of existing 
denoising methods is that the noise level is assumed 
to be known in advance. Therefore, it is appealing to 
perform MRI denoising automatically without needing 
tuning of the parameters. To achieve this, we apply a re¬ 
cently proposed Bayesian CP factorization (BCPF) 
method for MRI denoising. The procedure is described 
briefly as follows. Given an / x / x / reference cube 
in the noisy MRI data, K similar cubes (including the 
reference cube) are found and stacked into a 4D array of 
size / X / X / X A, which is thus filtered by BCPF. Finally, 
the denoised MRI is obtained by aggregating multiple 
estimations at each location. Since BCPF can estimate 
CP-rank efficiently and the noise variance automatically 
from given data, it can be easily applied to filtering 4D 
array instead of HOSVD. For illustration, we use the 


public MRI data[^ and consider Gaussian noise whose 
standard derivation is 10% of brightest tissue. Fig. 
shows the results of MRI denoising with an impressive 
performance of PSNR= 36dB. 

6.3 MRI completion 

Missing data are a common problem in MRI studies 
due in large part to susceptibility artifact and image 
acquisition parameters that result in incomplete brain 
coverage and spatial variation in acquired images, 
across subjects. In addition, MRI completion can be 
also used to improve spatial and temporal resolution 
by reconstructing the whole image from very few 
measurements. In this experiment, we illustrate MRI 
completion by applying the Bayesian tensor comple¬ 
tion (BTC) method [ |1QQ[ , particularly to improve the 
generalization performance when original data do not 
posses a global low-rank structure. Several recently 
proposed tensor completion methods including the 
high accuracy low rank tensor completion (HaLRTC) 
method the generalized higher-order orthogonal 

iteration (gHOOI) method [|121| , and the one based on 
weighted Tucker (WTucker) decomposition )l22| were 
also used for comparisons. For HaLRTC and WTucker, 
the tuning parameters were selected to yield the best 
possible performance. BTC methods as well as gHOOI 
can automatically adapt the tensor rank to data. As 
shown in Table BTC achieves the best performance 
under the conditions studied here. When missing ratio 

1. Dataset is available from http://brainweb.bic.mni.mcgill.ca/ 
brainweb/ 
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Fig. 11: Tensor completion for noisy MRI data. The stan¬ 
dard derivation of Gaussian noise was 3% of brightest 
tissue. MRI tensor is of size 181 x 217 x 165. 


TABLE 4: The performance of MRI completion evaluated by 
PSNR and RRSE under missing ratios of 50% and 80%. 


Methods 

50% missing 

80% missing 

PSNR 

RRSE 

PSNR 

RRSE 

BTC 

26.40 

0.12 

22.33 

0.19 

WTucker 

23.18 

0.18 

21.53 

0.21 

gHOOI 

21.57 

0.21 

19.77 

0.26 

HaLRTC 

23.12 

0.18 

17.06 

0.36 


is low (e.g., 50%), HaLRTC outperforms gHOOI, while 
gHOOI outperforms HaLRTC when the missing ratio 
is high (e.g., 80%). The visual quality of reconstructed 
MRI obtained by BTC is shown in Eig. 

7 Discussion, Conclusions and Future 
Perspectives 

Multi-set or multi-block coupled and inhomogeneous 
data are becoming ubiquitous across the sciences and 
engineering, especially in the biomedical field. Large 
scale medical/biological data using modalities such 
as EEC, fMRI, ECoG critically require scalable algo¬ 
rithms to efficiently process massive datasets within 
reasonable amount of time. To handle these issues, 
tensors, multi-dimensional generalizations of matrices, 
are attractive and promising tools for the representation 
and processing of such data. 

In this paper, we have studied analysis of such 
multi-relational or multi-block data using simultaneous 
(coupled) constrained matrix and tensor factorizations 
that allow one to extract common and individual com¬ 
ponents and/or to establish links between them. We 
critically reviewed several useful matrix and tensor 
decomposition models for linked multiway analysis (or 
multilinear blind source separation). 

Eor large-scale multi-relational data analysis, tensor 
decompositions is one emerging technology. Current 
applications in many areas of science and technol¬ 
ogy, such as computational neuroscience, bioinformat¬ 


ics, and pattern/image recognition, generate massive 
amounts of data with multiple aspects and high di¬ 
mensionality, and a number of successful examples 
have demonstrated that tensor decompositions with 
low-rank approximations can provide often useful and 
meaningful approximate representations for such large 
scale applications |110| , |123| . Tensor and matrix de¬ 
compositions, in particular using the linked models 
introduced here, allow us to discover meaningful hid¬ 
den structures of complex brain data and perform 
generalizations by capturing multi-linear and multi¬ 
aspect relationships. To serve this purpose, flexible 
CIEA models were introduced to simultaneously model 
common features shared by all data sources and indi¬ 
vidual features that are specific to each data source. In 
BSS, subspace definitions for a joint BSS framework as 
recently introduced [124| , ]125| can be used to define 
such common and distinct subspaces within IVA and 
the concentrated (hence linked) ICA models like group 
and joint ICA. Eor all those methods, a number of 
challenges exist for very large scale data, feature extrac¬ 
tion/selection, classification, clustering and anomaly 
detection. A recent paper addresses the selection of 
model order when the sample size is relatively small by 
specifically considering common and distinct subspaces 
as in CIFA ll26|| . 


Another closely related approach for big data not 
discussed in this paper is tensor networks. Complex 
interactions and operations between tensors can be 
visualized by tensor network diagrams in which tensors 
are represented graphically by nodes or shapes [IllOL 
Urn] . In such representations, each outgoing edge (line) 
emerging from a node represents a mode (a way, a 
dimension, indices). Tensor network diagrams could 
be potentially very helpful not only in visualizing 
tensor decompositions but also for expressing complex 
mathematical (multilinear) operations of contractions 
of tensors miol , (nVj . Tensor networks are connected 
to quantum physics, quantum chemistry and quantum 
information, which studies the ways to possibly build 
a quantum computer and to program it |127| . In ad¬ 
dition, tensor networks provide graphically illustrative 
large distributed networks and enable one to perform 
complex tensor operations (i.e., tensor contractions and 
reshaping) in an intuitive way and without the need to 
use explicit mathematical representation. 

To summarize, the benefits of linked multiway (ten¬ 
sor) analysis methods along with matrix-based ap¬ 
proaches for massive biomedical data include: 

• A framework to incorporate different type of di¬ 
versity or various constraints in different modes 
or factors, and naturally extending the standard 
component analysis and BSS methods to large- 
scale joint BSS as well as multiway data analyses; 

• Possibility to handle noisy and missing data natu¬ 
rally by using powerful low-rank tensor approxi¬ 
mations and by exploiting robustness and stability 
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of tensor decomposition algorithms; 

• ''Super" compression of large-scale multidimen¬ 
sional data via tensorization and decomposition 
of a high-order tensor into factor matrices and/or 
core tensors with low-rank and low-order i), (n^; 

• Ability to perform all mathematical operations in 
feasible tensor network formats, with the ability to 
improve the scalability of algorithms |11Q| . 

Hence, linked component analysis is a very promis¬ 
ing area of research with substantial potentials in 
biomedical applications. 
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